Building a Dictionary using XML Technology
نویسندگان
چکیده
In this article we describe the workflow implemented to convert a dictionary saved as a PDF file into an XML document and posterior importation into an XML aware database, and the process to edit, add and delete new entries. The conversion process was challenging given the format of the PDF file, and the fine grained detail of the XML schema that was used. For that, an iterative filtering approach was used. To store the dictionary we decided to use an XML aware database (eXist-DB), that stores each dictionary entry as a separate resource. It can be queried used a web interface developed using XQuery. The lexicographers can edit entries using the oXygen XML editor, reading and storing them directly in the database. In order to guarantee incremental backups, it was defined a mechanism to import the XML database into a GIT repository. Finally, a couple of programs were created in order to prepare regular reports on the dictionary revision process, as well as to backup it in a GIT repository. 1998 ACM Subject Classification I.7.2 Document Preparation / Markup languages
منابع مشابه
Combining Efficient XML Compression with Query Processing
This paper describes a new XML compression scheme that offers both high compression ratios and short query response time. Its core is a fully reversible transform featuring substitution of every word in an XML document using a semi-dynamic dictionary, effective encoding of dictionary indices, as well as numbers, dates and times found in the document, and grouping data within the same structural...
متن کاملEin XML-basiertes Datenbanksystem für digitale Wörterbücher - Ein Werkstattbericht aus dem Institut für Deutsche Sprache (An XML-Based Database System for Online Dictionaries - A Report on Lexicographic Work at the Institute for German Language)
Zusammenfassung Das Online-Wortschatz-Informations-system Deutsch (OWID) ist ein digitales Wörterbuchportal des Instituts für Deutsche Sprache. Alle darin zusammengeführten lexikografischen Daten sind auf XML-Basis feingranular struk-turiert. Speicherung, Verwaltung und Retrieval dieser Daten übernimmt das Oracle-basierte Electronic Dictionary Administration System (EDAS). Der vorliegende Beitr...
متن کاملDictionary Building with the Jibiki Platform: the GDEF case
This paper presents the use of the “Jibiki” generic dictionary online development platform in the case of the GDEF Estonian-French bilingual dictionary building project. This platform has been developed mainly by Mathieu Mangeot and Gilles Sérasset based on their research work in the domain. The platform is generic and thus can be used in (almost) any kind of dictionary development project from...
متن کاملTvärslå – defining an XML exchange format and then building an on-line Nordic dictionary
Tvärslå is a dynamically expandable multilingual on-line dictionary, composed of all dictionaries used and developed in the Nordisk netordbog (Nordic Web Dictionary) project. Currently the languages included are Swedish, Danish, Norwegian, Icelandic, Finnish and English. Tvärslå can be used both interactively and called by the Tvärsök system [1]. This article describes the functionality of Tvär...
متن کاملXML/XSL in the Dictionary: The Case of Discourse Markers
We describe our ongoing work on an application of XML/XSL technology to a dictionary, from whose source representation various views for the human reader as well as for automatic text generation and understanding are derived. Our case study is a dictionary of discourse markers, the words (often, but not always, conjunctions) that signal the presence of a disocurse relation between adjacent span...
متن کامل